- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources3
- Resource Type
-
0001000002000000
- More
- Availability
-
30
- Author / Contributor
- Filter by Author / Creator
-
-
Fraser, Nicholas J. (2)
-
Tran, Nhan (2)
-
Umuroglu, Yaman (2)
-
Blott, Michaela (1)
-
Blott, Michaela [1] (1)
-
Borras, Hendrik [5] (1)
-
Duarte, Javier (1)
-
Duarte, Javier Duarte (1)
-
Gambardella, Giulio (1)
-
Hauck, Scott [7] (1)
-
Hawks, Ben (1)
-
Hawks, Benjamin (1)
-
Hsu, Shih-Chieh (1)
-
Leeser, Miriam (1)
-
Loncar, Vladimir (1)
-
Mitrevski, Jovan [2] (1)
-
Muhizi, Jules [6] (1)
-
O’brien, Kenneth (1)
-
Pappalardo, Alessandro (1)
-
Pappalardo, Alessandro [1] (1)
-
- Filter by Editor
-
-
null (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
null (Ed.)Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning , and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.more » « less
-
Pappalardo, Alessandro; Umuroglu, Yaman; Blott, Michaela; Mitrevski, Jovan; Hawks, Ben; Tran, Nhan; Loncar, Vladimir; Summers, Sioni; Borras, Hendrik; Muhizi, Jules; et al (, Fermi National Accelerator Lab)
-
Blott, Michaela; Preußer, Thomas B.; Fraser, Nicholas J.; Gambardella, Giulio; O’brien, Kenneth; Umuroglu, Yaman; Leeser, Miriam; Vissers, Kees (, ACM Transactions on Reconfigurable Technology and Systems)
An official website of the United States government

Full Text Available